Load and Tidy the shot data

library(tidyverse)
library(plotly)

theme_set(theme_minimal())

shot_data <- readxl::read_excel("Sample Player Test Data.xlsx")

shot_data <- shot_data %>% 
  mutate_at(c("Player", "Club"), factor) 

Assuming that the goal of any club test is to find better accuracy, more distance, or some combination of both. I’m choosing to start by looking at accuracy.

Average distance offline for each club

shot_data %>% 
  select(Player, Club, `Actual Offline`) %>%
  group_by(Club) %>% 
  summarise(Avg_Miss = mean(`Actual Offline`, na.rm = TRUE))
## # A tibble: 2 x 2
##   Club   Avg_Miss
##   <fct>     <dbl>
## 1 Club A    6.76 
## 2 Club B   -0.147

It looks like there might be one club that is much closer to the target than the other. Next, I want to see what the distribution of this offline data looks like.

pl <- ggplot(shot_data, aes(`Actual Offline`)) +
  geom_density() +
  geom_vline(xintercept = mean(shot_data$`Actual Offline`), color = 'red', alpha = 0.34) +
  scale_x_continuous(limits = c(-70, 70))

ggplotly(pl)

Looks more or less normal which means that I can use a T-Test to verify what the simple means above are hinting at. First I’ll check that the distribution is normal using a Shapiro-Wilk test.

shapiro.test(shot_data$`Actual Offline`)
## 
##  Shapiro-Wilk normality test
## 
## data:  shot_data$`Actual Offline`
## W = 0.99707, p-value = 0.6948

T-Test:

t_test <- t.test(shot_data$`Actual Offline`~shot_data$Club)
t_test
## 
##  Welch Two Sample t-test
## 
## data:  shot_data$`Actual Offline` by shot_data$Club
## t = 3.9666, df = 397.99, p-value = 8.647e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3.484323 10.332125
## sample estimates:
## mean in group Club A mean in group Club B 
##            6.7615757           -0.1466481

Cool, so it looks like Club B is up closer to the target on average for most of the participants. I like to see results like this visually so I will now make a couple of visuals to help enforce this result.

Boxplot showing the range of the actual offline shot for each club

p <- shot_data %>% 
  group_by(Player, Club) %>% 
  summarise(Avg_Miss = mean(`Actual Offline`)) %>% 
  ungroup() %>% 
  ggplot(aes(Club, Avg_Miss, col = Club)) + 
    geom_boxplot() +
    coord_flip()

ggplotly(p)

Plot of the average offline total per each player

offline <- shot_data %>% 
  group_by(Player, Club) %>% 
  summarise(Avg_Miss = mean(`Actual Offline`)) %>% 
  ungroup()

plot <- ggplot(offline, aes(Club, Avg_Miss, group = Player)) +
    geom_point(aes(text = paste("Avg Miss: ", Avg_Miss, ", Player: ", Player), color = Player, size = 3, alpha = 0.5)) +
    geom_line(aes(color = Player, alpha = 0.5)) +
    coord_flip() +
    theme(legend.position = "none")

ggplotly(plot, tooltip = c("text"))

Pretty clear that Club B is biasing shots more left than Club A. There could be some left-handed players which might explain some of the players going farther right than they started. I want to see if there was any effect on the top ten players with the most significant accuracy gains as far as distance is concerned.

diff_df <- shot_data %>% 
  group_by(Player, Club) %>% 
  summarise(Avg_Miss = mean(`Actual Offline`)) %>% 
  spread(Club, Avg_Miss) %>% 
  mutate(Diff_Miss = `Club A` - `Club B`) %>% 
  arrange(desc(abs(Diff_Miss)))

dva <- shot_data %>% 
  right_join(diff_df) %>% 
  select(Player, Club, `Actual Carry`, Diff_Miss) %>% 
  group_by(Player, Club, Diff_Miss) %>% 
  summarise(Avg_carry = mean(`Actual Carry`)) %>% 
  spread(Club, Avg_carry) %>% 
  mutate(Diff_Carry = `Club A` - `Club B`) %>% 
  select(Player, Diff_Miss, Diff_Carry) %>% 
  arrange(desc(abs(Diff_Carry)))
## Joining, by = "Player"

Interestingly it seems that the players with the biggest accuracy gains had the smallest change in carry yards in the group.

ggplot(dva, aes(Diff_Miss, Diff_Carry)) +
  geom_point(aes(col = Player, size = 3, alpha = 0.34)) + 
  geom_smooth(method = lm, color = "FireBrick") +
  theme(legend.position = 'none')

A quick plot shows a general downward trend suggesting that there is a negative correlation in this case between accuracy gained and distance delta. However the confidence interval surrounding the line is massive, so I’m not sure I could make a definitive statement here. What does seem to be clear is that Club B averages about 7 yards closer to the target than Club A.